Define and formulate the likelihood function (\(\mathcal{L}_n(\theta)\)) as the joint density of IID observations
Differentiate between MLE and Maximum A Posteriori (MAP) estimates, specifically how MAP incorporates a Bayesian prior
Understand different hypothesis testing approach
Estimating distribution parameter from data
In this session we focus on how to estimate the distribution parameters from the IID observed data \(X=x_1,\dots,x_n\)
We assume \(X\) follows a specific distribution how can we know the paparemeter?
If it follows a normal distribution, then what is the procedure
Intuitive guess
Using the sample mean, variance as the best guess
Likelihood
The likelihood function (\(\mathcal{L}_n(\theta)\)) is the joint density of IID observations \(X=x_1,\dots,x_n\) with PDF \(f(x;\theta)\), treated as a function of the parameter \(\theta\) given the fixed data.
Likelihood is the product of the probabilities of observing \(X\) as a function of different \(\theta\).
This is the observed data we see in calculating Bayesian posterior
Example if \(X \sim \text{Bern}(p)\), and we observed \([1,0,0,1,1]\)
then the likelihood is
\[
\mathcal{L}_n(\theta) = p^3(1-p)^2
\]
Maximum likelihood
The ideal \(\theta\) is the one (\(\hat{\theta}\)) that maximizes the likelihood (i.e., makes the observed data most probable). We call this the Maximum Likelihood Estimate (MLE).
Calculus tells us the maximum of a function occurs where:
Any positive constant multiplier \(c\) won’t affect the location of the maximum, so constants can be ignored.
Taking the logarithm of the likelihood is often easier for calculation. We define the log-likelihood function as \(\ell_n(\theta)=\ln(\mathcal{L}_n(\theta))\). Reasons:
Numerical stability
Logarithms transform products into sums.
Example: Bernoulli MLE
if \(X \sim \text{Bern}(p)\), and we observed \([1,0,0,1,1]\)
Requires sufficient sample size for the CLT to ensure \(\hat{\theta}\) is asymptotically Normal.
The p-value is \(P(|W| \geq |W_{obs}| \mid H_0)\), not the probability that the null hypothesis is true.
Likelihood ratio test (LRT)
A more general way incorporating for a multivariate parameters (i.e. a vector-based parameter \(\theta=(\theta_1,\dots,\theta_n)\)) is likelihood ratio test. The general form is